Towards practical classical processing for the surface code 
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The surface code is unarguably the leading quantum error correction code for 2-D nearest neighbor 
architectures, featuring a high threshold error rate of approximately 1%, low overhead implemen- 
tations of the entire Clifford group, and flexible, arbitrarily long-range logical gates. These highly 
desirable features come at the cost of significant classical processing complexity. We show how to 
perform the processing associated with an n x n lattice of qubits, each being manipulated in a real- 
istic, fault-tolerant manner, in 0{ri^) average time per round of error correction. We also describe 
how to parallelize the algorithm to achieve 0(1) average processing per round, using only constant 
computing resources per unit area and local communication. Both of these complexities are optimal. 



Quantum computing promises exponentially faster 
processing of certain problems, including factoring |lj 
and simulating quantum physics 2]. Many quantum al- 
gorithms are now known [3] . The primary challenges are 
to mitigate and cope with the imperfections of quantum 
devices. The surface code supports a powerful quan- 
tum computing scheme [6|-|8| featuring an experimentally 
realistic threshold error rate of approximately 1% and 
requiring only a 2-D square lattice of qubits with nearest 
neighbor interactions. In this work, we describe how to 
perform the complex classical processing associated with 
the full, fault-tolerant scheme in a complexity-optimal 
manner. Using the current version of our code, we can 
simulate the fault-tolerant operation of millions of qubits, 
four orders of magnitude more than in any previous work. 
A detailed timing analysis can be found in jTo| . 

Previous works on the classical processing of topolog- 
ical quantum error correction (QEC) have obtained re- 
sults by making one of two significant modifications to 
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the problem. Large lattice sizes have been simulated 
however only by assuming that 4-qubit operators can be 
measured perfectly. Small lattices have been simulated 
fault-tolerantly 0, H, ll^ [isj , however these works used 
the code of Kolmogorov [IJ] which does not support con- 
tinuous processing of an arbitrary number of rounds of 
QEC. Our code now supports continuous fault-tolerant 
processing, using constant memory and with processing 
rate independent of the number of rounds. 

The surface code involves a 2-D lattice of qubits with 
local stabilizers Q • We shall initially assume perfect sta- 



bilizer measurement to compare with [11| . Error chain 



endpoints anticommute with stabilizers leading to -1 sta- 
bilizer measurements. Each -1 is associated with a vertex. 
Assuming independent errors, long error chains are expo- 
nentially unlikely. Edges between vertices vi = ji), 
V2 = (*2, ^2) are given a weight W12 to equal to their Man- 
hattan separation |i2~*i| + |j2— Edges to boundaries 
are given a weight equal to their length. 

We independently correct X and Z errors u sing the 
minimum weight perfect matching algorithm 1^ ij We 
use our own implementation which includes the concept 
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FIG. 1: a) Example of a weighted graph with edges connecting 
vertices to labeled boundaries bi. b) Set of edges such that 
every vertex is incident on exactly one edge and the total 
weight of all edges is minimal. 



of boundaries and returns edges such that every vertex 
is incident on exactly one edge and the total weight is 
minimal (Fig. [I}. Corrections are applied along the cho- 
sen edges. Logical errors occur when after correction a 
chain of errors connecting opposing boundaries remains. 
Provided error chains are well separated, this is unlikely. 

Clearly, a complete graph cannot be used if an O^n?) 
runtime is to be achieved as a complete graph has O(n^) 
edges. Simply generating a complete graph would there- 
fore take a minimum of 0{n'^) time. Progress can be 
made by noting that we are initially only considering 
data qubit errors and that a chain of errors between two 
-1 stabilizer measurements has length given by the Man- 
hattan metric. If we choose a vertex and imagine look- 
ing at the graph from that vertex, nearby vertices casts 
shadows, where we define a point in the plane to be in 
shadow if it can be reached by a minimum length path 
passing through another vertex (Fig. [5^). We define a 
vertex to be shadowed if it is in shadow yet neighbors 
an unshadowed point, and deeply shadowed if all neigh- 
boring points are shadowed. Empirically, we find that 
if two vertices are deeply shadowed when viewed from 
one another, there always exists a minimum weight per- 
fect matching that does not use an edge between them 
(Figs. [21d-c). This implies that such edges do not need 



2 



a) 



b) 



J 



FIG. 2: a) Plane as seen from vertex v. Vertices 1 and 2 are 
unshadowed, vertex 3 is shadowed, vertex 4 is deeply shad- 
owed, b) A matching with an edge to a deeply shadowed 
vertex, c) An equal weight rearrangement of the matching. 
A lower weight matching is possible. 




An example of a tight edge. Edge ei2 has the property 
5 — i;i — 7/0 — yi — Yo = 0. 



FIG. 3 

that wi2 - yi - y2 ~ Yi - Y2 = 



to be included in the graph that describes the problem. 
This cuts the number of edges down to O(n^). The va- 
lidity of this approach has been verified with millions of 
simulations finding identical weight matchings with both 
the complete and shadowed approaches. 

We now describe the minimum weight perfect match- 
ing algorithm, focusing on what is actually done. The 
original papers of Edmonds 1^ l3| beautifully prove that 
the method works. First, some definitions. Let G be a 
graph with vertices {vi}, edges {eij}, and edge weights 
{wij}. Associate with each vertex Vi a variable yi, which 
can be thought of as the radius of a ball centered at Vi. 
Odd sets of vertices can also be made into blossoms 
that have their own variables , which can be thought of 
as the width of shell around every object in Bk- If a pair 
of blossoms intersect, then one is contained in the other. 
Define an edge to be tight if Wij Hi — Vj — ^Yk ~ 0, 
where the sum is over k such that exclusively Vi or Vj is 
in Bk- This condition is pictorially depicted in Fig.[3l 

Define a node to be a vertex or blossom. Define a 
blossom to be unmatched if it contains an unmatched 
vertex. An alternating tree is a tree of nodes rooted on 
an unmatched node such that every path from the root 
to a leaf consists of alternating unmatched and matched 
edges. Alternating trees can only branch from the root 
and every second node from the root. Define branching 
nodes to be outer. Define all other nodes in the alternat- 
ing tree to be inner. Fig.|3]shows all necessary alternating 




FIG. 4: All required alternating tree manipulations, a) In- 
crease outer node and decrease inner node y values (or Y if 
the node is a blossom), maintaining the tightness of all tree 
edges and potentially creating new tight edges connected to 
at least one outer node, b) Inner blossoms with Y = can be 
expanded into multiple inner and outer nodes and potentially 
some nodes that are no longer part of the tree, c) Outer- 
matched tight edges can be used to grow the alternating tree, 
d) Outer-inner tight edges can be ignored, e) Outer-outer 
tight edges make cycles that can be used to make blossoms, 
f) When another unmatched vertex v is found, or an edge to 
a boundary b, the path from the unmatched vertex within the 
root node through the alternating tree to u or 6 is augmented, 
meaning matched edges become unmatched and unmatched 
edges become matched. This strictly increases the total num- 
ber of vertices. 



tree manipulations. 

Given a weighted graph G, the following algorithm 
finds a minimum weight perfect matching. 

1. If there are no unmatched vertices, return the list 
of matched edges. 

2. Choose an unmatched vertex v to be the root of 
alternating tree. 

3. If no edges emanating from the outer nodes of the 
alternating tree are tight, henceforth called 0-tight 
edges, increase the value of y or F associated with 
each outer node while simultaneously decreasing 
the value of y or y associated with each inner node 
until an edge becomes O-tight, or an inner blossom 
node Y variable becomes (Fig. . 

4. If an inner blossom node Y variable becomes and 
there are still no O-tight edges, expand that blos- 
som and return to 3 (Fig. HJd). 

5. Choose an O-tight edge e. 

6. If e leads to a matched node not already in the 
alternating tree, add the relevant unmatched and 
matched edge and associated nodes to the alternat- 
ing tree and return to 3 (Fig. H};). 

7. If e leads to an inner node, mark e so it is not con- 
sidered again during the growth of this alternating 
tree and return to 3 (Fig. |3Ji). 
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ideal threshold error rate of 10.9% obtained using com- 
putationally inefficient techniques [17|, |18| . Furthermore, 
by comparing the fraction of the threshold error rate at 
which our curves cut a logical error rate of 4 x 10"'^ to 
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Data qubit bit-flip probabiiity (p) 



FIG. 5: Logical X error rate versus data qubit bit-flip 
probability p for various code distances d assuming perfect 
stabilizer measurement. The threshold error rate is 10.25%. 



8. If e leads to an outer node, add the unmatched 
edge to the alternating tree. There will now be 
a cycle of odd length. Collapse this cycle into a 
new blossom and associate a new variable Y = {) 
(Fig.gjj). Return to 3. 

9. If e leads to an unmatched vertex or boundary, add 
e to the alternating tree and augment the path 
(unmatchedomatched) from the unmatched ver- 
tex within the root node to the end of e (Fig. |3f). 
Destroy the alternating tree, keeping any newly 
formed blossoms. Return to 1. 

On average, the algorithm only needs to consider a 
small local region around each vertex to find another 
unmatched vertex to pair with. This is a property of 
the graphs associated with topological QEC only, as the 
probability of needing to consider an edge of length I 
decreases exponentially with /. This ensures that the 
runtime is 0{v?). 

If we consider a standard square surface code with 
smooth boundaries top and bottom and rough bound- 
aries left and right [l], we can randomly apply bit- flips 
X with probability p to the data qubits, perfectly mea- 
sure the Z-stabilizers, construct a shadowed graph as de- 
scribed above, perform minimum weight perfect match- 
ing, apply corrections along the matched edges, and test 
for logical failure by checking whether there are an odd 
or even number of bit-flips along the top boundary. After 
correction, there can only be an odd number of bit-flips 
along the top boundary if a chain of bit-flips has formed 
from top to bottom boundary, indicating a logical error. 
The shortest topologically nontrivial chain is the distance 
d of the code {n = 2d — \). By performing many simula- 
tions, the probability of logical X error versus p can 
be plotted for a variety of distances (Fig. [5]). 

We observe a threshold error rate of 10.25%, versus 
8.2% in the work of [ll|, only slightly below the known 



the equivalent quantities in [ll|, it can be seen that our 



approach corrects twice as many errors at a given code 
distance, leading to a quadratic improvement of the log- 
ical error rate. Finally, the raw speed of our code is 
evident in our ability obtain data far below threshold. 

High performance when assuming perfect stabilizer 
measurements enables comparison with existing results, 
however this case is not particularly interesting in prac- 
tice. Only a fully fault-tolerant approach can be con- 
sidered practical. When using fault-tolerant circuits to 
measure stabilizers and applying depolarizing noise, the 
simple 2-D square lattice with Manhattan metric consid- 
ered above becomes a complicated 3-D lattice ^] that has 
both spatial boundaries and a temporal boundary repre- 
senting the latest round of stabilizer measurements. A 
number of modiflcations to the algorithm are required to 
account for these differences. 

Firstly, calculating the complete shadow of a vertex 
does not work very well in the 3-D lattice. There are 12 
outward directions to consider instead of 4 and the 
probability that all 12 directions are blocked by nearby 
vertices is low. Instead of exploring the entire unshad- 
owed region, which can be exceedingly large, one needs to 
set a maximum radius of initial exploration. With high 
probability, only this initial region is required. Regions 
further from the vertex are only explored as required (if 
another unmatched vertex or boundary is not found in 
the initial region). The probability of requiring a region 
of radius r decreases exponentially with r. In practice, 
we choose r just large enough to ensure nearest and next 
nearest neighbor lattice locations are explored initially. 

Secondly, the mobile temporal boundary introduces 
additional complexity. One must add new vertices to 
the problem as new data is obtained. This is straight- 
forward, essentially just increasing the list of unmatched 
vertices. However, when growing an alternating tree, it 
is possible for the tree to attempt to grow into the future. 
We chose to solve this problem by undoing the growth of 
the tree and all of the changes its growth introduced — 
essentially running the algorithm backwards. Growth is 
only reattempted when data from more rounds of error 
correction is available. 

Thirdly, detecting logical errors is more complex. One 
needs turn off errors, perform a perfect round of stabilizer 
measurement, match all vertices, apply corrections, de- 
tect logical errors as before by considering the top bound- 
ary, and then undo everything to return the simulator to 
its state before logical error detection. Note that this 
would not be done in a real computer. This procedure 
consumes negligible additional memory as a do/undo ap- 
proach is used, and takes approximately 10 times as long 
as a single round of standard error correction. 
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FIG. 6: Logical X error rate per round of error correction pl 
versus depolarizing probability p for various code distances 
d using fault-tolerant stabilizer measurement. The threshold 
error rate is 0.9%. 



Fourthly, it is nontrivial to ensure the amount of mem- 
ory used remains finite as the simulation proceeds. With 
vanishingly small probability, in principle the entire his- 
tory of stabilizer measurements is required to correctly 
match current time data. In practice, we keep track of 
the maximum distance in the past the algorithm has tra- 
versed to perform any matching, and store a fixed mul- 
tiple of this distance, discarding any older data. Close 
to the threshold error rate, and especially above thresh- 
old, extremely large blossoms are created, involving thou- 
sands of vertices and hundreds of blossom layers. Storing 
these blossoms dominates the memory cost. The growth 
of large blossoms, which is a gradual process, provides 
a warning that more data needs to be stored, ensuring 
reliable processing. 

With these modifications in mind, fault-tolerant simu- 
lation data is shown in Fig. [51 All data points represent 
one or more simulation instances run continuously until 
a total of 10,000 logical errors were observed, enabling 
reliable determination of the probability of logical error 
per round of error correction. Note that the threshold er- 
ror rate is 0.9%, in contrast to prior work that estimated 
it at 1.1% [9]. This highlights the danger of estimat- 
ing thresholds from small distance data only. The new 
code exactly reproduces the curves of the old code up to 
d — 13, however the true threshold error rate is really 
only visible for d > 21, distances at which the old code 
can not be run. This lowered threshold can be attributed 
to boundary effects, since boundary stabilizers are lower 
weight and hence more reliable. The physical error rates 
at which a factor of 2 (10) improvement in logical error 
rate is observed when the code distance is increased by 2 
remain 0.5% (0.2%) respectively, so this threshold error 
rate change is of no practical significance. 

Memory limitations prevented the gathering of statis- 
tics near the threshold error rate at higher distances. 



however at p = 10~^ the vast blossoms that make high 
error rate correction so difficult do not occur, and it is 
straightforward to simulate distances as high as d = 1000 
— over 4 million qubits. Needless to say, no logical er- 
rors are observed in such a simulation. Each round of 
error correction takes under 3 seconds using a single core 
of an AMD Opteron, with each individual matching tak- 
ing tens of microseconds. Given the algorithm uses only 
local information, it is in principle straightforward to par- 
allelize, using a 2-D array of processors with each proces- 
sor handling a fixed size patch of code. Inserting a small 
pause after each patch correction enables any patch with 
a randomly harder matching to catch up after lagging 
behind. This would enable the classical processing of 
an infinite lattice in constant average time per round of 
error correction, with the average time approaching the 
time for a single matching in the limit of low p and high 
classical computing resources. 

In summary, we have introduced an algorithm that 
finds a minimum weight perfect matching in 0{n^) time 
given a graph generated by an n x n lattice of qubits 
running the surface code fault-tolerantly. This algorithm 
parallelizes to 0(1) on an infinite lattice with constant 
computing resources per unit area. It is conceivable 
that a parallel implementation could achieve hundred mi- 
crosecond processing of a round of error correction, suf- 
ficient to keep pace with ion trap quantum gates (T^]. 
Additional ideas, including implementation in hardware, 
would be required to achieve the submicrosecond process- 
ing times required to keep pace with faster gates such as 
those found in superconducting circuits (20j . 
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